Fast Online EM for Big Topic Modeling
نویسندگان
چکیده
منابع مشابه
Towards Big Topic Modeling
To solve the big topic modeling problem, we need to reduce both time and space complexities of batch latent Dirichlet allocation (LDA) algorithms. Although parallel LDA algorithms on the multi-processor architecture have low time and space complexities, their communication costs among processors often scale linearly with the vocabulary size and the number of topics, leading to a serious scalabi...
متن کاملOnline Belief Propagation for Topic Modeling
Not only can online topic modeling algorithms extract topics from big data streams with constant memory requirements, but also can detect topic shifts as the data stream flows. Fast convergence speed is a desired property for batch learning topic models such as latent Dirichlet allocation (LDA), which can further facilitate developing fast online topic modeling algorithms for big data streams. ...
متن کاملWhy ADAGRAD Fails for Online Topic Modeling
Online topic modeling, i.e., topic modeling with stochastic variational inference, is a powerful and efficient technique for analyzing large datasets, and ADAGRAD is a widely-used technique for tuning learning rates during online gradient optimization. However, these two techniques do not work well together. We show that this is because ADAGRAD uses accumulation of previous gradients as the lea...
متن کاملOnline Knowledge-Based Model for Big Data Topic Extraction
Lifelong machine learning (LML) models learn with experience maintaining a knowledge-base, without user intervention. Unlike traditional single-domain models they can easily scale up to explore big data. The existing LML models have high data dependency, consume more resources, and do not support streaming data. This paper proposes online LML model (OAMC) to support streaming data with reduced ...
متن کاملOnline Polylingual Topic Models for Fast Document Translation Detection
Many tasks in NLP and IR require efficient document similarity computations. Beyond their common application to exploratory data analysis, latent variable topic models have been used to represent text in a low-dimensional space, independent of vocabulary, where documents may be compared. This paper focuses on the task of searching a large multilingual collection for pairs of documents that are ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Knowledge and Data Engineering
سال: 2016
ISSN: 1041-4347
DOI: 10.1109/tkde.2015.2492565